[LeetCode/LintCode] Top K Frequent Words

LeetCode version

Problem

Given a non-empty list of words, return the k most frequent elements.

Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first.

Example 1:
Input: ["i", "love", "leetcode", "i", "love", "coding"], k = 2
Output: ["i", "love"]
Explanation: "i" and "love" are the two most frequent words.

Note that "i" comes before "love" due to a lower alphabetical order.

Example 2:
Input: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
Output: ["the", "is", "sunny", "day"]
Explanation: "the", "is", "sunny" and "day" are the four most frequent words,

with the number of occurrence being 4, 3, 2 and 1 respectively.

Note:
You may assume k is always valid, 1 ≤ k ≤ number of unique elements.
Input words contain only lowercase letters.
Follow up:
Try to solve it in O(n log k) time and O(n) extra space.

Solution

class Solution {
    public List<String> topKFrequent(String[] words, int k) {
        List<String> res = new ArrayList<>();
        if (words.length < k) return res;
        Map<String, Integer> map = new HashMap<>();
        for (String word: words) {
            if (!map.containsKey(word)) map.put(word, 1);
            else map.put(word, map.get(word)+1);
        }
        PriorityQueue<Map.Entry<String, Integer>> queue = new PriorityQueue<>(
            (a, b) -> a.getValue() == b.getValue() ? b.getKey().compareTo(a.getKey()) : a.getValue() - b.getValue()
        );
        for (Map.Entry<String, Integer> entry: map.entrySet()) {
            queue.offer(entry);
            if (queue.size() > k) queue.poll();
        }
        while (!queue.isEmpty()) {
            res.add(0, queue.poll().getKey());
        }
        return res;
    }
}

LintCode version

Problem

Find top k frequent words with map reduce framework.

The mapper's key is the document id, value is the content of the document, words in a document are split by spaces.

For reducer, the output should be at most k key-value pairs, which are the top k words and their frequencies in this reducer. The judge will take care about how to merge different reducers' results to get the global top k frequent words, so you don't need to care about that part.

The k is given in the constructor of TopK class.

Notice

For the words with same frequency, rank them with alphabet.

/**
 * Definition of OutputCollector:
 * class OutputCollector<K, V> {
 *     public void collect(K key, V value);
 *         // Adds a key/value pair to the output buffer
 * }
 * Definition of Document:
 * class Document {
 *     public int id;
 *     public String content;
 * }
 */

Example

Given document A =

lintcode is the best online judge
I love lintcode
and document B =

lintcode is an online judge for coding interview
you can test your code online at lintcode
The top 2 words and their frequencies should be

lintcode, 4
online, 3

Solution

// Use Pair to store k-v pair
class Pair {
    String key;
    int value;

    Pair(String k, int v) {
        this.key = k;
        this.value = v;
    }
}

public class TopKFrequentWords {

    public static class Map {
        public void map(String _, Document value,
                        OutputCollector<String, Integer> output) {
            // Output the results into output buffer.
            // Ps. output.collect(String key, int value);
            
            String content = value.content;
            String[] words = content.split(" ");
            for (String word : words) {
                if (word.length() > 0) {
                    output.collect(word, 1);
                }
            }
        }
    }

    public static class Reduce {
        private PriorityQueue<Pair> Q = null;
        private int k;

        private Comparator<Pair> pairComparator = new Comparator<Pair>() {
            public int compare(Pair o1, Pair o2) {
                if (o1.value != o2.value) {
                    return o1.value - o2.value;
                }
                //if the values are equal, compare keys
                return o2.key.compareTo(o1.key);
            }
        };

        public void setup(int k) {
            // initialize your data structure here
            this.k = k;
            Q = new PriorityQueue<Pair>(k, pairComparator);
        }

        public void reduce(String key, Iterator<Integer> values) {
            int sum = 0;
            while (values.hasNext()) {
                    sum += values.next();
            }

            Pair pair = new Pair(key, sum);
            if (Q.size() < k) {
                Q.add(pair);
            } else {
                Pair peak = Q.peek();
                if (pairComparator.compare(pair, peak) > 0) {
                    Q.poll();
                    Q.add(pair);
                }
            }
        }

        public void cleanup(OutputCollector<String, Integer> output) {
            // Output the top k pairs <word, times> into output buffer.
            // Ps. output.collect(String key, Integer value);
            List<Pair> pairs = new ArrayList<Pair>();
            while (!Q.isEmpty()) {
                pairs.add(Q.poll());
            }

            // reverse result
            int n = pairs.size();
            for (int i = n - 1; i >= 0; --i) {
                Pair pair = pairs.get(i);
                output.collect(pair.key, pair.value);
            }
            
            // while (!Q.isEmpty()) {
            //     Pair pair = Q.poll();
            //     output.collect(pair.key, pair.value);
            // }
        }
    }
}

[LeetCode/LintCode] Top K Frequent Words

LeetCode version

Problem

Solution

LintCode version

Problem

Notice

Example

Tags

Solution

linspiration

引用和评论

[LeetCode] 958. Check Completeness of a Binary Tree

Bitmap 和布隆过滤器傻傻分不清？你这不应该啊

Jerry和您聊聊Chrome开发者工具

Spring 实现 3 种异步流式接口，干掉接口超时烦恼

💢线上高延迟请求排查

这些年

一文讲清楚static关键字

[LeetCode/LintCode] Top K Frequent Words

LeetCode version

Problem

Solution

LintCode version

Problem

Notice

Example

Tags

Solution

linspiration

引用和评论

[LeetCode] 958. Check Completeness of a Binary Tree

Bitmap 和 布隆过滤器傻傻分不清？你这不应该啊

Jerry和您聊聊Chrome开发者工具

Spring 实现 3 种异步流式接口，干掉接口超时烦恼

💢线上高延迟请求排查

这些年

一文讲清楚static关键字

Bitmap 和布隆过滤器傻傻分不清？你这不应该啊